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Optimizing the cost of evaluating a polynomial is a classic problem in computer science. For 
polynomials in one variable, Horner's method provides a scheme for producing a computationally 
efficient form. For multivariate polynomials it is possible to generalize Horner's method, but this 
leaves freedom in the order of the variables. Traditionally, greedy schemes like most-occurring vari- 
able first are used. This simple textbook algorithm has given remarkably efficient results. Finding 
better algorithms has proved difficult. In trying to improve upon the greedy scheme we have im- 
plemented Monte Carlo tree search, a recent search method from the field of artificial intelligence. 
This results in better Horner schemes and reduces the cost of evaluating polynomials, sometimes by 
factors up to two. 
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I. INTRODUCTION 

Polynomials are fundamental objects in mathemat- 
ics and reducing the cost of evaluating polynomials is 
a classic problem in computer science. Applications 
abound, ranging from fast calculation on embedded de- 
vices and real-time calculations to high-energy physics 
(HEP), where one needs to perform Monte Carlo inte- 
grations of extremely large polynomials in many vari- 
ables Numerous methods to optimize polyno- 
mial evaluation have been proposed, such as Horner's 
method 043; common subexpression elimination Q, 
Breuer's growth algorithm [j| [l(| and, recently, partial 
syntactic factorization (Tlj . 

For a polynomial in one variable, Horner's method pro- 
vides a computationally efficient form for evaluating it: 



a(x) = ^aiX* = ao + x(ai+x(a,2 + x( \-x-a n ))). (1) 

»=o 

With this representation a dense polynomial of degree n 
can be evaluated with n multiplications and n additions, 
giving an evaluation cost of 2n. Here it is assumed that 
the cost of addition and multiplication are equal. 

For multivariate polynomials Horner's method can be 
generalized. To do so one chooses a variable and ap- 
plies Eqn. ([1]), thereby treating the other variables as 
constants. Afterwards another variable is chosen and the 
same process is applied to the terms within the paren- 
theses. This is repeated until all variables are processed. 
As an example, for the polynomial a = y — 3x + hxz + 
2x 2 yz — 3x 2 y 2 z + hx 2 y 2 z 2 and the order x < y < z this 
results in the following expression 

a = y + x(-3 + 5z + x(y(2z + y(z{-3 + 5z))))). (2) 

Regarding the evaluation cost, the original expression 
uses 5 additions and 18 multiplications, while the Horner 
form uses 5 additions but only 8 multiplications. In gen- 
eral, applying a Horner scheme keeps the number of ad- 



ditions constant, but reduces the number of multiplica- 
tions. 

After transforming a polynomial with Horner's 
method, the code can be further improved by perform- 
ing a common subexpression elimination (CSE) [8]. In 
Eqn. ([2]), the subexpression —3 + 5z appears twice. Elim- 
inating the common subexpression results in the code 



T = -3 + 5z 
a = y + x(T H 



x(y(2z + y(zT)))), 



(3) 



which uses only 4 additions and 7 multiplications. The 
code optimization package Haggies [l2| implements this 
method of Horner schemes followed by CSE. 

Finding the optimal order of variables for the Horner 
scheme is still an open problem for all but the smallest 
polynomials, which are studied in Ref. Q. Different or- 
ders may impact the cost of the resulting code, although 
no thorough study of this has been made to the authors' 
knowledge. Simple algorithms have been proposed in the 
literature, such as most-occurring variable first, which re- 
sults in the highest decrease of the cost at that particular 
step. This is also the order that is used in Haggies. 

We studied the results of choosing different orders of 
variables for the Horner scheme and discovered that this 
order greatly affects the results, sometimes improving the 
cost by factors up to two. Unfortunately, most often it 
is impossible to perform an exhaustive search through 
all Horner schemes, since their number increases as the 
factorial of the number of variables. Therefore we have 
devised a method to find efficient orders by using Monte 
Carlo tree search (MCTS) [3, LL4(, a recently proposed 
search method from the field of artificial intelligence. 



II. MONTE CARLO TREE SEARCH 

MCTS is a best-first search method that uses random 
sampling to guide the traversal of the search tree. It has 
recently drawn much attention due to its application in 
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FIG. 1: The asymmetric search through an MCTS search tree 
during the search for an efficient Horner scheme. 



the field of computer Go [13j, a classic board game in 
which computers have traditionally played weakly. In 
the past decade the application of MCTS has improved 
the playing strength of computers from the level of ad- 
vanced beginners to the level of strong amateur players. 
MCTS has also been applied successfully in numerous 
other games and optimization problems 

In MCTS the search tree is built in an incremental 
and asymmetric way, see Fig. [TJ During the search the 
traversed part of the search tree is completely in mem- 
ory. For each node MCTS keeps track of the number of 
times it has been visited and the estimated result of that 
node. At each step one node is added to the search tree 
according to a criterion that tells where most likely bet- 
ter results can be found. From that node an outcome is 
sampled and the results of the node and its parents are 
updated. This process is illustrated in Fig. [5J In more 
detail the four steps of the MCTS cycle are the following. 

Selection During the selection step the node which 
most urgently needs expansion is selected. Several cri- 
teria are proposed, but the easiest and most-used is the 
UCT (upper confidence level for trees) criterion 15| : 



UCTi = 



-2a 



21ogn 



(4) 



Here (xi) is the average score of child i, rii is the number 
of times child i has been visited and n is the number of 
times the node itself has been visited. C p is a problem- 
dependent constant that should be determined empiri- 
cally. Starting at the root of the search tree, the most- 
promising child according to this criterion is selected and 
this selection process is repeated recursively until a node 
is reached with unvisited children. The first term of 
Eqn. Q biases in favor of nodes with previous high re- 
wards (exploitation) , while the second term selects nodes 
that have not been visited much (exploration) . Balancing 
exploitation versus exploration is essential for the good 
performance of MCTS. 

Expansion The selection step finishes with a node 
with unvisited children. In the expansion step one of 
these children is added to the tree. 

Simulation In the simulation step a single possible 
outcome is simulated starting from the node that has 
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FIG. 2: The Monte Carlo tree search cycle illustrated. (Figure 
from Ref. 



just been added to the tree. This simulation can con- 
sist of generating a complete random outcome starting 
from this node or can be some known heuristic for the 
search problem. The latter typically works better if spe- 
cific knowledge of the problem is available. 

Backpropagation In the backpropagation step the 
results of the simulation are added to the tree, specifically 
to the path of nodes from the newly-added node to the 
root. Their average results and visit count are updated. 

This MCTS cycle is repeated a fixed number of times 
or until the computational resources are exhausted. After 
that the best found result is returned. 



III. EFFICIENT HORNER SCHEMES 

In the existing code optimization packages that use 
Horner schemes combined with CSE, a simple algorithm 
for the order of the variables is chosen. Widely used 
is the occurrence order, where the variables are sorted 
with respect to the number of occurrences in the poly- 
nomial [6j. The variable that has the largest number of 
occurrences comes first in the order and is the first one 
used in Horner's method. 

To test whether this algorithm gives efficient Horner 
schemes we took a large polynomial with 15 variables, 
a result from HEP calculations, and generated a mil- 
lion random orders which were used for Horner's method 
followed by CSE. The occurrence order performed quite 
well, about a standard deviation above average, but far 
better orders were also found. An interesting feature of 
the orders that led to efficient schemes also showed up: 
these orders all shared the same variables in the trailing 
part of the order. These are the variables that eventu- 
ally show up most often in the common subexpressions. 
These common subexpressions abound in the HEP poly- 
nomials due to much structure, such as combinations of 
coupling constants or dot products and masses. 

Motivated by this observation we use MCTS to deter- 
mine an order of the variables that gives efficient Horner 
schemes. The root of the search tree represents that no 
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variables are chosen yet. This root node has n children, 
with n the number of variables. The other nodes repre- 
sent choices for a number of variables in the trailing part 
of the order. This number equals the depth of the node 
in the search tree. A node at depth d has n — d children: 
the remaining unchosen variables. 

In the simulation step the incomplete order is com- 
pleted with the remaining variables added randomly. 
This complete order is then used for Horner's method 
followed by CSE. The number of operations in this opti- 
mized expression is counted. The selection step uses the 
UCT criterion with as score the number of operations in 
the original expression divided by the number of opera- 
tions in the optimized one. This number increases with 
better orders and is typically of 0(1). The constant C p 
in Eqn. Q must therefore be chosen of that size as well. 
Pseudocode of MCTS generated Horner schemes can be 
found in algorithm [TJ 



Algorithm 1 Pseudocode of MCTS Horner 

1: function MCTSHorner 

2: r <— new node with empty variable order 

3: for i •<— 1 . . . NumberOfTreeExpansions do 

4: s -i— select (r) 

5: s <— expand(s) 

6: x <— simulate(s) 

7: backpropagate(s,a;) 

8: return best optimized expression found 

9: 

10: function SELECt(s) 

11: while s is fully expanded do 

12: s«- argmax ^ + 2C p ,/g los( " (s)) 

children c 01 s * 

13: return s 

14: 

15: function expand(s) 

16: o <— variable order of s 

17: x <— random variable not in o 

18: o <s— append(o,j;) 

19: c <— new node with variable order o 

20: add c to children of s 

21: return c 

22: 

23: function SIMULATe(s) 

24: o <s— variable order of s 

25: while o doesn't contain all variables do 

26: x <— random variable not in o 

27: o <— append(o,a;) 

28: e HornerScheme(expression, o) 

29: e <— CommonSubexpressionElimination(e) 

30 - return NumbcrOiOpcrations(cxprcssion) 

Number OfOpcrations(e) 

31: 

32: function BACKPROPAGATE(s,e>ir) 

33: while s ^ null do 

34: x(s) x(s) + Sx 

35: n{s) i- n(s) + 1 

36: s <— parent of s 



IV. RESULTS 

To test the performance of this method an implementa- 
tion is added to the computer algebra package FORM [l6[ , 
which is widely used for HEP calculations. The results 
of this method are compared to a few existing algo- 
rithms. For comparison we added to Form optimiza- 
tion routines that use occurrence order Horner schemes 
followed by CSE. Furthermore, we compare to the open- 
source code optimization package Haggies [l2[ and the 
results from the paper on the hypergraph method based 
on partial syntactic factorization [11]. We also tried the 
code optimization routines of Mathematica and Maple, 
but their results were not of particular interest. Finally, 
we present the results of the new code optimization rou- 
tines of FORM, which consist of MCTS generated Horner 
schemes followed by greedy optimizations. A detailed de- 
scription of this algorithm will be presented in Ref. (l7j . 
Since this algorithm is an extension of MCTS Horner 
with CSE, it should perform better. 

The polynomials to be optimized consist of two sets. 
The first set of polynomials (taken from Ref. [TTJ] ) are 
the resultants of two polynomials, res x (a(x),b(x)), with 
a i x ) — J2iLo a i xt ancl b(x) — Yn=Q biX 1 1 which is viewed 
as a polynomial in the (m + n + 2) variables a$ and hi. 
The second set consists of a number of large multivariate 
polynomials resulting from HEP calculations @] . For the 
set of resultants we observed that the variables that are 
factored out first in the Horner scheme are critical for 
the performance of MCTS Horner, as opposed to the last 
variables which are important for the HEP polynomials. 
This is probably due to many common subexpressions 
appearing in the HEP polynomials if the right variables 
are chosen last. The resultants don't have that prop- 
erty and are probably more sensitive to a large decrease 
in the number of operations due to the Horner scheme 
itself. This conjecture must be investigated more rigor- 
ously. In the experiments for the resultants MCTS will 
search therefore for the best leading part of the variable 
order, while for the HEP polynomials it will search for 
the best trailing part. 

The results of the optimizations are expressed in the 
number of operations in the final expressions and are in 
Tab. UJ It is clear that MCTS Horner with CSE beats 
the existing algorithms, if the parameters (the exploita- 
tion/exploration constant C p from Eqn. (U) and the num- 
ber of tree expansions N) are chosen properly. 

The effectiveness of MCTS depends heavily on the 
choice for these parameters. The results of MCTS with 
3 000 tree expansions, followed by CSE, as a function of 
C p are in Fig. [3] for a large polynomial from HEP. For 
equal values of C p different results are produced because 
of different seeds of the random number generator. For 
small values of C p , such that MCTS behaves exploitively, 
the method gets trapped in one of the local minima as 
can be seen from the different lines in the left-hand side 
of the figure. For large values of C p , such that MCTS 
behaves exploratively, lots of options are considered and 
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TABLE I: The number of operations in the final expression after applying the optimization methods on the polynomials. 
The results for the hypergraph method are from Ref. we did not have the opportunity to test this method on the HEP 
polynomials. Different MCTS experiments are performed with different values for Cp, which are determined by trial and error. 
These are present in the table underneath their corresponding results. The MCTS results are statistical averages of different 
samples. 



no real minimum is found as can be seen from the cloud 
of points on the right-hand side. For intermediate val- 
ues of C p ~ 1 MCTS balances well between exploitation 
and exploration and finds almost always a Horner scheme 
that is very close to the best one known to us. 

The results of improving this polynomial for different 
numbers of tree expansions are shown in Fig.|4j For small 
numbers of tree expansions it turns out to be good to 
choose a low value for the constant C v (smaller than 0.5). 



The search is then mainly driven by exploitation. MCTS 
quickly searches deep in the tree, most likely around a 
local minimum. This local minimum is explored quite 
well, but the global minimum is likely to be missed. With 
higher numbers of tree expansions a value for C' p in the 
range [0.5; 2] seems suitable. This gives a good balance 
between exploring the whole search tree and exploiting 
the promising nodes. Really high values of C p seem a bad 
choice in general, since promising nodes are not exploited 
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FIG. 3: A scatter plot of the results of MCTS Horner with 
CSE with 3 000 tree expansions for the polynomial HEP (a) as 
a function of the constant Cp that determines the balance be- 
tween exploitation (small values) and exploration (large val- 
ues). Shown are 4 000 randomly chosen values for C p with 
their corresponding results. 



FIG. 4: The results of MCTS Horner with CSE for the poly- 
nomial HEP (a) as function of the exploitation/exploration 
constant C p and the number of tree expansions N. All data 
points are statistical averages over at least 100 samples. The 
fourth line from above (green) consists of the average values 
per Cp of the scatter plot of Fig. 
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TABLE II: The run times of the various optimization algo- 
rithms on the polynomial res(7, 5). All tests are performed 
on 2.4GHz Pentium or Xeon CPUs. 



anymore. Note that these values hold for this particular 
polynomial, and that different polynomials give different 
optimal values for C p and N. The different values for C p 
in Tab. U are determined by trial and error and give de- 
cent results. With some more tuning even better results 
can probably be achieved. Automatic tuning of this pa- 
rameter would be very convenient and is part of ongoing 
research. 

The different algorithms vary a lot regarding the 
consumed computational resources, see Tab. [TTJ MCTS 
Horner with CSE needs considerably more run time than 
a greedy Horner scheme with CSE. This makes sense, 
because MCTS basically does such an operation per tree 
expansion. With only few computational resources avail- 
able it is better to use a greedy Horner scheme than to 
use MCTS. If used it searches through the tree for good 
schemes for too short a time and does not find one, there- 
fore resulting in a bad scheme. Compared to Haggies 
and the hypergraph method, MCTS with 300 expansions 
gives slightly longer run times for slightly better results. 
When the quality of the polynomial evaluation scheme 
is of great importance, it makes sense to spend more 
time to find a better evaluation scheme. With more time 
available, so that large parts of the search tree can be tra- 
versed, MCTS Horner with CSE or greedy optimizations 
outperforms all other methods considerably. 



V. DISCUSSION 

Polynomials are fundamental mathematical objects, 
naturally occurring at many places in mathematics and 
science. Efficient evaluation of polynomials is of great 
importance to many application areas. Horner's method 
is a simple approach straight out of undergraduate algo- 
rithms textbooks. For something so basic, it is remark- 
able that over the years so little improvement has been 
made in finding more efficient evaluation schemes. 

We improve on the traditional multivariate Horner 
schemes, where the variable order is fixed by a simple 
procedure, by employing MCTS. By statistically sam- 
pling the different variable orders it is designed to balance 
exploitation of known good schemes while not forgetting 
to explore unknown schemes as well. 

The basic multivariate Horner schemes generated with 
most-occurring variables ordered first will quickly yield 
schemes that may be efficient enough for many applica- 
tions. More demanding domains, where the expressions 
are large and/or evaluated many times, need better eval- 
uation schemes. For these applications it pays to invest 
the time to generate them. MCTS is suitable for these 
types of applications. Evaluating large expressions for 
Feynman diagrams in HEP is one such domain, where 
large expressions have to be evaluated many times doing 
Monte Carlo integration [H-Q- 

For all examined polynomials MCTS Horner, followed 
by CSE, generated evaluation schemes that were bet- 
ter than any other algorithm that we tried. A further 
analysis of the performance is ongoing research. This 
includes sensitivity analysis to different parameters (ex- 
ploitation/exploration constant, number of tree expan- 
sions), dependence on the polynomial (size, number of 
variables, use of heading or trailing part of the variable 
order) , automatic tuning of the parameters to the polyno- 
mial, better criteria for the selection step, possible heuris- 
tics for the simulation step (such as using the occurrence 
order instead of random completion) , and the paralleliza- 
tion of the algorithm. 
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